GENERA[W86,JMC]4 - www.SailDart.org

perm filename GENERA[W86,JMC]4 blob sn#816380 filedate 1986-05-04 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00003 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	genera[w86,jmc]		Generality in AI, for old Turing lecture
C00024 00003	notes:
C00028 ENDMK
C⊗;
genera[w86,jmc]		Generality in AI, for old Turing lecture
GENERALITY IN ARTIFICIAL INTELLLIGENCE

by John McCarthy, Stanford University

	My 1971 Turing Award Lecture was entitled ``Generality in
Artificial Intelligence''.  The topic turned out to have been
overambitious in that I discovered that I was unable to put my thoughts on
the subject in a satisfactory written form at that time.  It would have
been better to have reviewed previous work rather than attempt something
new, but such wasn't my custom at that time.

	I am grateful to the ACM for the opportunity to try again.
Fortunately for this project, although perhaps unfortunately for our
science, the problem of generality in AI is almost as unsolved as ever,
although I can now use ideas that were not available in 1971.  This paper
relies heavily on such ideas.

	It was obvious in 1971 and indeed in 1958 that AI programs
suffered from a lack of generality.  It is still obvious, and now there
are many more details.  The gross symptom is that a small addition to the
idea of a program often involves a complete rewrite beginning with the
data structures.  Some progress has been made in modularizing data
structures, but small modifications of the search strategies are even less
likely to be accomplished without rewriting.
When we take the logic approach to AI, lack of generality shows up in that
the axioms we devise to express common sense knowledge are too restricted
in their applicability to be put in a general common sense database that
could be used by any program needing knowledge about the phenomenon in
question.

	Here is a list of some of the ideas with dates of proposals to
apply them to AI.

	1. logic as a source of generality in AI (1958).

	2. formalized non-monotonic reasoning (1977)

	3. formalization of concepts (1979)

	4. formalization contexts (pending)

	5. elaboration tolerance (pending)

	6. separation of epistemology from heuristics.

	7. representation of behavior by program [Friedberg 1959]

	8. logic programming (1970s)

	9. production systems (1970s)


	Friedberg [1959] discussed a completely general way of
representing behavior and  provided a way of learning to improve it.  
Namely, the behavior is represented by a computer program and
learning is accomplished by making random modifications to the
program and testing the modified program.  The Friedberg approach
was successful in learning only how to move a single bit from
one memory cell to another, and its scheme of rewarding instructions
involved in successful runs by reducing their probability of modification
was shown by Simon [196x] to be inferior to testing each program thoroughly
and completely scrapping any program that wasn't perfect.
Friedberg then abandoned the field, and no-one seems to have attempted
to follow up the idea of learning by modifying whole programs.

	The defect of the Friedberg approach is that while representing
behaviors by programs is entirely general, modifying behaviors by
small modifications to the programs is very special.  A small
conceptual modification to a behavior is usually not represented
by a small modification to the program, especially if machine
language programs are used and any one small modification to the
text of a program is considered as likely as any other.  It would
be more plausible to attempt something more analogous to genetic
evolution wherein copies of subroutines would be made and some
copies would be modified and others left unchanged.  The learning
system would then experiment whether it  was advantageous to change
certain calls of the original subroutine to calls of the modified
subroutine.  Most likely even this wouldn't work, because the
relevant small modifications of behavior wouldn't very likely
be representable by calls to modified subroutines.

	It seemed to me that small modifications in behavior are
more often representable as small modifications in beliefs about
the world, and this requires a system that represents beliefs
explicitly.

	[McCarthy 1960] said, ``If one wants a machine to be able to
discover an abstraction, it seems most likely that the machine must be
able to represent this abstraction in some relatively simple way''.

	The [McCarthy 1960] idea for increasing generality was to
use logic to express facts in a way independent of the way the facts
might subsequently be used.  It seemed then and still seems that
humans communicate mainly in declarative sentences rather than in programming
languages for good objective reasons that will apply whether the
communicator is a human, a creature from Alpha Centauri or a computer
program.  Moreover, the advantages of declarative information also
apply to internal representation.  The advantage of declarative
information is one of generality.  The fact that when two objects
collide they make a noise may be used in particular situations to make
a noise, to avoid making noise, to explain a noise or to explain the
absence of noise.  (I guess those cars didn't collide, because while
I heard the squeal of brakes, I didn't hear a crash).  It can also
be used in the design of mechanism, e.g. the bell in a telephone.  It
turns out that using a fact in such a general way is more difficult
for AI than using it in specific situations.  For example, facts
represented as fragments of Prolog program can be used flexibly in
specific situations but are difficult to incorporate in design
programs without being expressed quite differently.

	While the 1958 idea was well received, very few attempts
were made to implement it in the immediately following years,
the main one being Fischer Black's Harvard PhD thesis of
196x.  I spent most of my time on what I regarded as preliminary
projects, mainly LISP.  My main reason for not attempting an
implementation was

	[McCarthy and Hayes 1969] made the distinction between
epistemological and heuristic aspects of the AI problem and asserted
that generality is more easily studied epistemologically.  The distinction
is that the epistemology is completed when the facts available have
as a consequence that a certain strategy is appropriate to achieve
the goal, while the heuristic problem involves the search that
finds the appropriate strategy.

	Implicit in [McCarthy 1960] was the idea of a general purpose
common sense database.  The common sense information possessed by humans
would be written as logical sentences and included in the database.  Any
goal-seeking program could consult the database for the facts needed
to decide how to achieve its goal.  Especially prominent in the database
would be facts about the effects of actions.  The much studied example
is the set of facts about the effects of a robot trying to move objects
from one location to another.
  This led in [McCarthy and Hayes 1969] to
the {\it situation calculus} which was intended to provide a way of
expressing the consequences of actions independent of the problem.

	Unfortunately, it wasn't very feasible to use the situation calculus 
in the manner proposed.  In the first place, the use of general
purpose theorem provers made the programs run too slowly, since
the theorem provers of 1969 [Green 1969] had no way of controlling
the search.  This led to STRIPS [Fikes and Nilsson 197x] which
reduced the use of logic to reasoning within a situation.  Unfortunately,
the STRIPS formalizations were much more special than full situation calculus.
The facts that were included in the axioms had to be delicately chosen
in order to avoid the introduction of contradictions arising from
the failure to delete a sentence that wouldn't be true in the
situation that resulted from an action.

	The second problem with the situation calculus axioms is that
they were again not general enough.  This was the {\it qualification problem},
and a possible way around it wasn't discovered until the late 1970s.
Consider putting an axiom in a {\it common sense database} asserting that
birds can fly.  Clearly the axiom must be {\it qualified} in some way since
penguins, dead birds and birds whose feet are encased in concrete
can't fly.  A careful construction of the axiom might succeed in
including the exceptions of penguins and dead birds, but clearly
we can think up as many additional exceptions like birds with
their feet encased in concrete as we like.  Formalized non-monotonic
reasoning (see [McCarthy 1980, 1986], [Doyle 1978], [McDermott and
Doyle 1980] and [Reiter 1980]) provides a formal way of saying that
a bird can fly unless there is an abnormal circumstance and reasoning
that only the abnormal circumstances whose existence follows from the
facts being taken into account will be considered.

	Non-monotonicity has considerably increased the possibility of
expressing general knowledge about the effects of events in the situation
calculus.  It has also provided a way of solving the {\it frame problem},
which constituted another obstacle to generality that was already
noted in [McCarthy and Hayes 1969].  The frame problem (The term has
been variously used, but I had it first.) occurs when there are
several actions available each of which changes certain features
of the situation.  Somehow it is necesary to say that an action
changes only the features of the situation to which it directly
refers.  When there is a fixed set of actions and features, it can
be explicitly stated which features are unchanged by an action, even
though it may take a lot of axioms.  However, if we imagine that
additional features of situations and additional actions may be
added to the database, we face the problem that the axiomatization
of an action is never completed.  [McCarthy 1986] indicates how
to handle this using {\it circumscription}, but Lifschitz [1985]
has shown that this method needs to be improved, presumably by
some modification of the circumscription method of non-monotonic
reasoning.

	Even with formalized non-monotonic reasoning, the general
commonsense database still seems elusive.  The problem is writing
axioms that satisfy our notions of incorporating the general facts about
a phenomenon.  Whenever we tentatively decide on some axioms, we
are able to think of situations in which they don't apply and
a generalization is called for.  Moreover, the difficulties that
are thought of are often {\it ad hoc} like that of the bird with
its feet encased in concrete.

	A possible way out involves formalizing the notion of context
and combining it with the circumscription method of non-monotonic
reasoning.  We add a context parameter to the functions and predicates
in our axioms.  Each axiom makes its assertion about a certain context.
Further axioms tell us that facts are inherited by more restricted
context unless exceptions are asserted.  Each assertions is also
non-monotonically assumed to apply in any particular more general
context, but there again are exceptions.  For example, the rules
about birds flying implicitly assume that there is an atmosphere
to fly in.  In a more general context this might not be assumed.
It remains to determine how inheritance to more general contexts
differs from inheritance to more specific contexts.

	Even the above is unpleasantly vague, but it's a lot more
than could be said in 1971.

GENERA[W86,JMC] TEXed on \jmcdate at \time.
notes:


We need to be able to tell it metamathematics.

modality si, as a matter of generality

generality and lack thereof in Prolog

quote from 1958 paper about generality


	[McCarthy 1958] was in part a reaction to the lack of generality
of the Samuel [195x and 19xx] checker program that learned how to
improve its ability to play checkers.  Its behavior was determined
by the weights assigned to various features used in evaluating a checker
position.  These features included the numbers of men and kings on
each side, control of the center and back rows, other features considered
important by checker players and still others of no previously known
checker importance that Samuel included
to see if the learning program would assign them significant weights.
(It didn't).  The program adjusted the weights of the features of
positions in order to predict as well as possible the moves considered
good in a large database of master games.  The learning program was
quite successful, and one can conjecture that the ultimate weights
were close to optimal.

	However, even the best possible weights for the features considered
provided no way of taking into account certain checker stratagems.  For
example, a situation can sometimes be created in which a king belonging
to one side prevents two single men from the other side from advancing.
If this situation can be maintained while on the rest of the board
even exchanges and promotions to kings occur, eventually the side
whose king is holding the two singles will outnumber the other side
by one on the rest of the board.  This is a sufficient advantage
to win by a series of exchanges.  Lookahead cannot be deep enough to
see the win when the situation is first created.  There is probably
no setting of the parameters that will always give sufficient
importance to freeing the two blocked singles without leading to
wrong moves in other situations.  The problem is that the space
of values of the Samuel parameters provides an insufficiently general
representation of checker strategy.  Moreover, there is no way
of telling the Samuel program about the danger of letting a king
block two singles; the only way to get it in involves rewriting the
program.